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Abstract. We present a simple way of coding and com- 
pressing the data on board the Planck instruments (HFI 
and LFI) to address the problem of the on board data 
reduction. This is a critical issue in the Planck mission. 
The total information that can be downloaded to Earth 
is severely limited by the telemetry allocation. This limi- 
tation could reduce the amount of diagnostics sent on the 
stability of the radiometers and, as a consequence, curb 
the final sensitivity of the CMB anisotropy maps. Our 
proposal to address this problem consists in taking differ- 
ences of consecutive circles at a given sky pointing. To a 
good approximation, these differences are independent of 
the external signal, and are dominated by thermal (white) 
instrumental noise. Using simulations and analytical pre- 
dictions we show that high compression rates, Cr — 10, 
can be obtained with minor or zero loss of CMB sensitiv- 
ity. Possible effects of digital distortion are also analized. 
The proposed scheme allows for flexibility to optimize the 
relation with other critical aspects of the mission. Thus, 
this study constitutes an important step towards a more 
realistic modeling of the final sensitivity of the CMB tem- 
perature anisotropy maps. 

Key words: cosmology: cosmic microwave background - 
cosmology: observations - Methods: statistical - Methods: 
data analysis - Techniques: miscellaneous 



1. Introduction 

The PLANCK Satellite is designed to measure temper- 
ature fluctuations in the Cosmic Microwave Background 
(CMB) with a precision of ~ 2fiK, and angular resolution 
of about 5 arcminutes. The pay load consists of a 1.5-2.0 
m Gregorian telescope which feeds two instruments: the 

* Presently on leave at: Department of Mathematics, 
Room 2-363A, Massachusetts Institute of Technology, 77 Mas- 
sachusetts Av., Cambridge, MA 02139-4307, USA. 



High Frequency Instrument (HFI) with 56 bolometer ar- 
rays operated at O.IK and frequencies of 100 — 850GHz 
and the Low Frequency Instrument (LFI) with 56 tuned 
radio receivers arrays operated at 20K {4:K) and frequen- 
cies of 30 — 100 GHz (see http://astro.estec.esa.nl/SA- 
general/Projects/Planck/ for more information). 

Data on board PLANCK consist of N differential tem- 
perature measurements, spanning a range of values we 
shall call TZ. Preliminary studies and telemetry allocation 
indicate the need for compressing these data by a ratio 
of Cr ^ 10. Here we will consider under what conditions 
it might be possible to achieve such a large compression 
factor. 

A discretized data set can be represented by a number 
of bits, ribits, which for linear Analogue-to-digital convert- 
ers (ADC) is typically given by the maximum range N„iax'- 
"■bits = log2 Njnax ■ If wc cxprcss the joint probability for 
a set of N measurements as we have that the 

Shannon entropy per component of the data set is: 

/i = ^^ P*i,-,»jvlog2b*i,. ..,««)• (1) 

ii ,...,ijv 

Shannon's theorem states that h is a lower bound to the 
average length of the code units. We will therefore define 
the theoretical (optimal) compression rate as 



For a uniform distribution of N measurements we have 
Pi — l/N and h ~ log2N, which equals the number of bits 
per data. Thus: it is not possible to compress a (uniformly) 
random distribution of measurements. 

Gaztanaga et al. (1998), have argued that a well cali- 
brated signal will be dominated by thermal (white) noise in 
the instrument: CTe — ctt and therefore suggested that the 
digital resolution A only needs to be as small as the instru- 
mental RMS white noise: A ~ <tt — 2mK. The nominal 
liK pixel sensitivity will only be achieved after averaging 
(on Earth). This yields compression rates of Cr,opt — 8. 
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On the other hand Maris et al. (1999) have used the same 

formalism as Gaztanaga ct al. but fixing the final dynam- 
ical range TZ to some fiducial values, so that the digital 

TZ 

resolution is then given by A = — , independently of 

2 bits 

<tt- Again, assuming a well calibrated signal dominated by 
thermal (white) noise, this approach yields smaller com- 
pression rates of Cr,opt — 4, as it is obvious from the fact 
that the digital resolution is larger (A is smaller). In both 
cases, the effect of the CMB signal (eg dipole) and other 
sources (such as the galaxy) have been ignored. 

Several questions arise from these studies. What is the 
optimal value of A and what are the penalties (distor- 
tions) involved when using large values of A? Moreover, 
can the data gathered by the on board instruments be re- 
ally modeled as a white noise signal? In other words, are 
the departures from Gaussianity (due to the galactic, fore- 
grounds, dipole and CMB signals) important? This latter 
question is closely related to the way data will be pro- 
cessed (and calibrated) on board, for example: if and how 
the dipole is going to be used for calibration. These issues 
together with the final instrument specifications seem to 
play an important role on the final range of values TZ and, 
therefore, the possible compression rates. This is some- 
how unfortunate as compression would then be related in 
a rather complicated way to the nature of the external 
signal and also to critical issues of the internal data pro- 
cessing issues. 

Here we shall present a simple way of coding the on 
board data that will solve the lossless compression prob- 
lem in a much simpler way. This will be done indepen- 
dently of the internal calibration or the nature of the ex- 
ternal signal (CMB or otherwise). We will also address the 
issue of the digital distortion introduced (the penalty) as 
a function of the final compression (the prize). 

In section §2 we give a summary of some critical issues 
related to the on-board data. Our coding and compres- 
sion proposals are presented in §3, while simulations are 
dicussed in §4. We end up with some concluding remarks. 

2. ON BOARD DATA 

2.1. Data Rate, Telemetry and Compression 

To illustrate the nature of the compression problem we 
first give some numbers related to the LFI. Similar esti- 
mations apply to the HFI. 

According to the PLANCK LFI Scientific and Techni- 
cal Plan (Part I, §6.3, Mandolesi et al. 1998) the raw data 
rate of the LFI is ~ 260 Kb . This assumes: i) a 
sample frequency of 6.9 ms or fsampi = 144.9 Hz, which 
corresponds to 2.5 arcmin in the sky, 1/4 of the FWHM 
at 100 GHz, ii) A^detec = 112 detectors: sky and reference 
load temperature for 56 radiometers, iii) nbits = 16 bits 
data representation. Thus that the raw data rate is: 

Td = fsampi X Ndetec X Hbits ^ 259.7 Kbs"^ . (3) 



GHz 


FWHM 


(Tt (mK) 


T (mK) 


Det. 


Kbs-i 


30 


33' 


2.8 


-30-61 


4 


9.3 


44 


23' 


3.2 


-30-138 


6 


13.9 


70 


14' 


4.1 


-20-340 


12 


27.8 


100 


10' 


5.1 


-10-667 


34 


78.8 


TOTAL 






-30-667 


56 


130 


-l-LOAD 








112 


260 



Table 1. Parameters for the radiometers: a) central frequency 
v (bandwidth is 20%); b) angular resolution (beam FWHM); c) 
RMS thermal noise expected at 6.9 ms (144.9 Hz) sampling; d) 
range of temperatures expected from the sky (Jupiter, dipole, 
S-Z); e) number of detectors (2x horns); f) total data rate at 
6.9 ms (2.5 arcmin). 



The values for each channel are shown in Table 1. A factor 
of two reduction can be obtained by only transmitting 
the difference between sky and reference temperature. To 
allow for the recovery of diagnostic information on the 
separate stability of the amplifiers and loads, the full sky 
and reference channels of a single radiometer could be sent 
at a time, changing the selected radiometer from time to 
time to cover all channels (Mandolesi et al. 1998). 

Note that the sampling resolution of 6.9 ms corre- 
sponds to 2.5 arcmin in the sky, which is smaller than 
the nominal FWHM resolution. Adjacent pixels in a circle 
could be averaged on-board to obtain the nominal resolu- 
tion (along the circle direction). In this case the pixel size 
should still be at least ~ 2.5 smaller that the FWHM to al- 
low for a proper map reconstruction. Note that each circle 
in the sky will be separated by about 2.5' so even after this 
averaging along the circle scan there is still a lot of redun- 
dancy across circles. For pixels of size 9 ~ FWHM/2.5 
along the circle scan the total scientific rate could be re- 
duced to r ~ 67 Kb (or 134 Kb with some subset 
information of the ref. load). 

The telemetry allocation for the LFI scientific data is 
expected to be rj = 20 Kbs~^ . Thus the target compres- 
sion rates are about: 

= — ~ 3 - 13, (4) 

n 

depending on the actual on-board processing and require- 
ments. 

2.2. Scanning and Data Structure 

The Planck satellite spins with a frequency fspin = 1 rpm 
so that the telescope (pointing at approximately right an- 
gles to the spin axis) sweeps out great circles in the sky. 
Each circle is scanned at the same position in the sky 9 
for over 2 hours, so that there are 120 images of the same 
pixel (the final number might be different but this is not 
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relevant here). We can write the whole data in each point- 
ing ^ as a matrix: 



(5) 



where S stands for the external signal (CMB, galaxy, fore- 
grounds) and rj stands for the internal (eg, instrumen- 
tal) noise. The k index labels the number of spins in that 
pointing and a labels the positions within the circle. Each 
measurement is mostly dominated by instrumental noise, 
ctt ~ 2mK (see Table 1) rather than by the CMB noise 
{<^CMB — W~'^mK). If this noise (at frequencies smaller 
than fspin) mostly thermal, one could then say that 
there is no need for compression, as we can just average 
those 120 images of a given pixel in the sky and only send 
the mean downwards to Earth. The problem is that one 
expects 1/f instabilities to dominate the instrument noise 
at frequencies smaller than ~ 0.1 Hz. Compression is only 
required when we want to keep these 120 images in or- 
der to correct for the instrument instability in the data 
reduction process (on Earth). 

2.3. Dynamic Range & Sensitivity 

The rms standard deviation level in the CMB anisotropics 
is expected to be of a few tens of iiK. These anisotropies 
will be mapped with a ~ l^K resolution. But the final 
dynamic range for the measured temperature differences 
per angular resolution pixel will be AT ~ IjiK — IK. The 
maximum resolution of c± IfiK will only be obtained after 
averaging all data. The highest value ~ lif is given by the 
hottest source that we want to keep (not saturated) at any 
of the frequencies. Positive signals from Jupiter, which will 
be used for calibration, can be as large as ~ O.IK at 100 
GHz. Other point sources and the Galaxy give intermedi- 
ate positive values. Negative differences (with respect to 
the mean CMB T ~ 2.7 K), of the order of a few mK, 
can be originated by the dipole, the relative velocity be- 
tween the satellite velocity and the CMB rest frame. The 
Sunyaev-Zeldovich effect can also give a negative signal of 
few IQmK. Thus the overall range of external tempera- 
ture differences could be —"iOmK to IK. The internal ref- 
erence load will also be subject to variations which have 
to be characterized. The instrument dynamic range will 
depend on the final design of the radiometers and its in- 
ternal calibration. This is not well understood yet and it is 
therefore difhcult to assess how it will affect the on-board 
information content. 

Planck LFI radiometers are modified Blum correlation 
receivers (see Blum 1959). Both LFI and HFI radiometers 
have an ideal white noise sensitivity of 



T 



(6) 



and fJT is a characteristic rms noise temperature. The val- 
ues of (Tt (shown in Table 1) correspond to the equivalent 
noise in a sampling interval, and N above is the number 
of such samplings (or pixels) at a given sky position. The 
final target sensitivity required by the Planck mission to 
"answer" many of our cosmological questions about the 
CMB is about Tcmb — IQ~^K. Thus, we need to inte- 
grate over about N ~ 10^ elements (i.e. pixels) with the 
thermal noise shown in Table 1. This, of course, is just an 
order of magnitude estimation as the detailed calculation 
requires a careful consideration of the removal of instru- 
ment instabilities and the use of multiband frequency to 
subtract the different contaminants. 

As pointed out by Herrcros ct al. (1997) the tempera- 
ture digital resolution should be given by the receiver noise 
(Tt on the sampling time 6.9 ms (or corresponding value 
if there is some on-board averaging) and not by the final 
target sensitivity. At the end of the mission, each FWHM 
pixel will have been measured many (~ 10^) times. Thus a 
higher resolution of AT ~ 1^,K is not necessary on board, 
given that the raw signal is dominated by the white noise 
component. This higher resolution will be later obtained 
by the pixel averaging (data reduction on Earth). Using 
an unnecessary high on-board temperature resolution (eg 
a small A) will result in a larger Shannon entropy (eg 
h cx ^017(1/ A)) which will limit even more the amount of 
scientific and diagnostic information that can be download 
to Earth. 



2.4. Instrumental Noise & Calibration 

We can distinguish two basic components for the receiver 
noise: the white or thermal noise, and the instabilities or 
calibration gains (like the 1/f noise). An example is given 
by the following power spectrum of frequencies /: 



P(/) = ^ 1 + 



fknet 



(7) 



where t is the integration time, 1/ is the band width (about 
20% of the central frequency of the channel for the LFI) 



The 'knee' frequency, fknee, is expected to be fknee — 
0.005 Hz for a 4K load or fknee 0.06 Hz for a 20K 
load. The expected RMS thermal noise, (Tt cx ^ at the 
sampling frequency (2.5 armin), is listed in Table 1. The 
lowest value is given by the 30 GHz channel and could 
be further reduced to ~ ImK if the data is averaged to 
FWHM/2.5 to obtain the nominal resolution. The larger 
values in the dynamical range can be affected by the cal- 
ibration gains. This is important and should be carefully 
taken into account if a non-linear ADC is used, as gains 
could then change the relative significance of measure- 
ments (eg, less significant bits shifting because of gains). 
In fact, a 1// power spectrum integrated from the knee- 
frequency (fknee) for a time T, gives a rms noise that di- 
verges with T. The integration (or sampling) over a single 
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pixel could be modeled as some (effective) sharp-k window 

of size fmax- 

= Z^"" rf/ = 4^!^ ln(T/„„.) (8) 

Jmax J 1 /T / J max 

For a T ~ 1 year mission the contribution from the 
1// noise in pixels averaged after succesive pointings 
fmax — lO"** and wc have ^ 10^o"|,! This illustrates 
why the calibration problem is so important and makes 
a large dynamic range desirable. Averaging pixels at the 
spin rate, fmax — fspinj gives cr^j ~ lOa^, this is not too 
bad for the dynamic range, but it corresponds to a mean 
value and there could be more important instantaneous or 
temporal gains. Drifts with periods longer than the spin 
period (1 rpm) can be removed by requiring that the aver- 
age signal over each rotation at the same pointing remains 
constant. Drifts between pointings (after 2 hours) could 
be reduced by using the overlapping pixels. All this could 
be easily done on-board, while a more careful matching 
is still possible (and necessary) on Earth. This allows the 
on-board gain to be calibrated on timescale larger than 1 
min with an accuracy given by ar- Additional and more 
careful! in-flight calibration can also be done using the 
the signal from external planets and the CMB dipole. Al- 
though this is an interesting possibility for the on-board 
reudction we will present below a simpler and more effi- 
cient alternative. 

3. CODING &: COMPRESSION 

We will assume from now on that the external signal does 
not vary significantly with time during a spin period (1 
minute), i.e. Sk^aid) — Sk+i,a{d), so that Eq.[5] yields: 

Dk+l,a{0) Sk,a{0) + Vk+l.a- (9) 

Consider now the differences 6 between the circle scans in 
two consecutive spins of the satellite: 

Sk,aW = Dk+l,a{6) - Dk,a{0) ^ Vk+l,a " Vk,a- (10) 

These differences are independent of the signal Sa (0) and 

are just given by a combination of the noise rj. Obviously 
the above operation does not involve any information loss 
as the set of original data images (Dfc,a , fc = 1, 120) can 
be recovered from one of the full circles, say Di ^, and the 
rest of the differences {dk,a , k = 2, 120). Occasionally, the 
external signal could vary significantly during 1 minute 
(eg cosmic rays, a variable star or some outbursts). This 
will not result in any loss of information but will change 
the statistics (and therefore compressibility) of the the 
differences {5k,a - Here we assume that the overall statistics 
are dominated by the instrumental noise. A more detailed 
study will be presented elsewhere. 

What we propose here is to compress the above noise 
differences dk,a{0) before downloading them to Earth. 
This has several advantages over the direct compression 
of the Dk^ct- 



— 6k,a are independent of the input signals, which are 
in general non-Gaussian, eg galaxy, foregrounds, plan- 
ets... 

— The new quantity to be compressed should approach a 

(mutivariate) Gaussian, as it is just instrumental noise. 

— this scheme is independent of any on board calibration 
or processing. 

— Sk.a should be fairly homogeneous (the radiometers 
are supposed to be fairly stable over time scales of 
1 minute), so that compression rates should be quite 
uniform . 

— because of the reasons above there is a lot of fiexibility 
on data size and processing requirements. For the raw 
data estimated in Table 1 of ~ 260 Kb.s"'^ it will 
take about ~ 2 Mbytes to store a full revolution. Thus, 
compression of a few circles at a time might be possible 
with a ~ 16 Mbytes on-board RAM memory. 

— The resulting processing will be signal lossless even if 
the noise is binned with a low resolution before com- 
pression. This is not clear when Dk^a are used instead. 

In the last point, digital binning of the noise 6k^a could 
affect the final sensitivity of the mission by introducing 
additional digital distortion or discretization noise, which 
could add to the instrumental noise in a significant way. 
We will later quantify this. 

We will further assume that the noise rik,a in Eq.[10] is 
not a function of the position in the sky but just a function 
of time. Thus we will assume that rik,a are a realization of 
an stochastic (multivariate) Gaussian process with a given 
power spectum: P{f), eg Eq.[7]. We then have: 

Sk,a = 'nk+l,a - f]k,a, (H) 

SO that 5k,a will also be Gaussian, but with a different 
power spectrum. To a good approximation the noise 5 will 
be almost white or thermal, as differences between compo- 
nents of adjacent vectors (circles) are separated by 1 min. 
ifspin ~ 0.02_ffz.) which is comparable to or larger than 
the typical fknee frequencies {fknee — 0.005Hz for 4K load 
in the LFI). From now on we will assume, for the sake of 
simplicity, that (5 is a purely white noise with as ~ V^ciri 
(~ 3mK). Deviations from this assumption are studied in 
Appendix A. 

To estimate the entropy associated with Sk^a and its 
corresponding Cr^opt in Eq.[2] we need to know how S is 
discretized, i.e., what is A in Eq.[18]. This value will in 
principle be given by the ADC hardware: Aadc- The de- 
tails of the ADC in each instrument will be driven by the 
electronics, the final target of temperature range TZ and 
the internal calibration processes. 

3.1. Digital Distortion 

In order to make quantitavive predictions we need to know 
the ADC details, i.e., how the on-board signal will be digi- 
talized. To start with, we will take the digital resolution A 
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to be a variable. The noise differences could be subject to a 

further (on board) digitalization (in general with the pos- 
sibility of A Aadc)- This would allow the compression 
target to be independent of other mission critical points. If 
the ADC digital resolution is significantly larger than the 
value of A under consideration {Aadc < A) the binned 
data will suffer an additional digital distortion, which will 
add to the standard ADC distortion (which will be proba- 
bly given by other instrumental considerations). In general 
we represent the overall digital distortion by V, which is 
defined as 



V = 



,2 — 



(12) 



where i5 is the discretized version of S and (. . .) is the mean 
over a given realization. It is well known (see e.g. §5 in 
Gersho & Gray 1992) that in the limit of small A = A/cr, 
the digital distortion of a signal is simply given by 



V = 



-De 



A2 



A2 
12' 



(13) 



i.e., V is proportional to the digital resolution in units of 
the rms (white noise) deviation. The rms a of the dis- 
cretized version of 6, which we shall call 5, is 



£)2 

^ + 2- 



a 



£)2 



A2 

12 



(14) 



where e = 5 — 5 and (e(5) denotes the correlation between 
this quantity and 5, which is usually small. The discretized 
field has a larger rms deviation than the original one. 

As mentioned in §2.3, the final signal sensitivity, Tqmb 
of the survey will only be achieved on Earth after averag- 
ing many observations, destripping, galaxy and foreground 
removal, etc. Eq.(6) shows that this sensitivity should be 
proportional to a combination of the thermal noises of 
each instrument — ctt and, therefore, to as- Thus, the 
relative effect of the discretization on the mission sensitiv- 
ity is just given by the ratio 



TcMB crs 



- 1 



12 



(15) 



The approximate form is valid for small A, and comes from 

taking the approximation for Dcrr from eq. (13), and ne- 
glecting (e(5). For example, for A ~ ct^, we have a 4% rel- 



ative decrease in the sensitivity, ie omb ^ q q4 within 

, JCMB 

this approximation (see §4). This loss of sensitivity only 
affects the noise (not the signal) and could be partially (or 
mostly) the result of the ADC hardware requierements, 
rather than the compression process itself. 



3.2. On Board Compression 

Romeo et al. (1998) have presented a general study of 

(correlated multi-Gaussian) noise compression by study- 
ing Shannon entropies per componet h, and therefore the 
optimal compression Cr,opt in Eq.(2). For a linearly dis- 
cretized data with Ubits = log2 N^ax bits, the Shannon 
entropy h in Eq.[2] depends only on the ratio of the digi- 
tal resolution A to some effective rms deviation, <Je: 



h — log2(V27re (Te/A) 



(16) 



with 0-2 = {AeiCY/^ , where C is the covariance ma- 
trix for the (multi-Gaussian random) field Xi, i.e., Cij = 
{xiXj}. In the case of the error differences 6 of Eq.[ll], we 
have that ag = crs and therefore: 



h = log2(^ 



(17) 



For a data set with nbits = log2 Nmax bits the optimal 
compression rate in Eq.[2]] is given by: 



»^bits 



log2 (V27re (7e/A) 



(18) 



Thus, if we take A ~ ct^ the optimal compression is sim- 
ply: 



= ribits/ log2(V27re) ^ 8. 



(19) 



where we have used nbits = 16 as planned for the Planck 
LFI. This very large compression rate can be obtained 
because there is a large range of values ~ A2"bits which 
has a very small probability, and therefore can be eas- 
ily compressed (e.g. by Huffman or arithmetic coding). 
As mentioned above, the loss of sensitivity due to digi- 
tal distortion (i.e. Eq.[15]) is, in this case, 4% within this 
approximation. 

Another nice feature of our scheme is that higher (or 
lower) compressions can be achieved if we arc willing to 
reduce (or increase) the final temperature sensitivity to 
digital distortion. As mentioned before this could be re- 
lated to the ADC specifications. 

4. SIMULATIONS 

The process of generating, quantizing, storing, compress- 
ing, and comparing the recovered and initial differences 
has been numerically simulated. A set of Sk^aS, a = 
I, . . . , N for fixed fc, is produced as a random vector — say 
i5 — of Gaussian components with a given variance a = as- 
Next, the vector is linearly discretized or quantized, ac- 
cording to a chosen value of A, as explained in Romeo et 
al (1999), yielding a new — approximated — vector called 



S, whose components arc of the form 5j 



where qj is an integer. The set of values qj, j = 1, . . . ,N, 
associated to each component, is then stored into a third 
vector made of 16-bit integers, and eventually written on a 
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Fig. 1. Discretization and compression simulations for a set 

oi N = 8700 data values, quantized to different A-values. The 
three plots correspond to: compression factor Cr (top), relative 
distorsion error -^^2. (middle) , and relative sensitivity variation 

(bottom). Square symbols correspond to sim- 



-'CMB 



CMB 

ulation results and the solid lines have been obtained from the 
theoretical predictions Cr ^ 16//i, h ~ logj (VStt e/A) (top), 

^err ^ A 



^ ~ ^ (middle) and 



1 + 41 - 1 (bottom). 



Since Huffman compression involves no loss, the decod- 
ing procedure amounts to recovering the b vector. There- 
fore, the associated digital distortion error Derr is nothing 
but the average of the squared differences between the 
components of 8 and those of 5, as stipulated by eq. (12). 
This distortion is numerically evaluated, and its value 
compared with the small-A approximation given by eq. 
(13). Further, from the simualtions themsleves we find cr-^, 
and calculate the sensitivity variation as defined in eq. 
(15). The exact figures are compared with the approxi- 
mated part of the same equation. 

This is illustrated by the example depicted in Fig. 1, 
which displays a simulation with a Gaussian white noise 
vector of iV = 8700 components, which corresponds to 
1 minute of data at 6.9 ms sampling rate (i.e. one cir- 
cle). Up to A 1.2 — 1.5, the actual compression factors 
arc just marginally smaller than the theoretical or ideal 
ones. On the other hand, one can observe that the small- 
A predictions (solid lines) for distortion and sensitivity 
changes happen to be quite accurate. The example shows 
that Cr ~ 7.3 for A = 1, with a relative sensitivity decrease 
of ^^aviB ^ 0.04. 

-'CMB 

It is remarkable that the crude small-A approximations 
that have been applied work so well for this problem. To 
understand what happens, we have calculated corrections 
to these predictions by including: 

— finite-sampling effects 

— the contribution of (e(5) to cry 

Since we are handling finite samples, the integrations or 
summations of functions involving the probability distri- 
bution should be limited to the range effectively spanned 
by the available values of our stochastic variable. Given 
that we only have N samples and a resolution limited by 
the value of A, any magnitude of the order of will be in- 
distinguishable from zero. Hence, the actual range is just 
[— n(A)A, n(A)A], where n(A) is determined by 



/(n(A)A) = - 



(20) 



and / is our Gaussian probability distribution function. 
This equality leads to 



n(A) = Round 




(21) 



binary file, which is the object to be actually compressed. 
A Huffmann compression program which has been spe- 
cially adapted for 16-bit symbols is then applied to the file 
in question, and the resulting compression factor, which 
is the quotient between initial and final file sizes, is duly 
recorded. The obtained compression factor is compared 

with the expected theoretical result = — , with h given 

by eq.(18) with cTg = as- 



h 



The (eS) correlation is so small that, up to now, it has been 
regarded as a vanishing quantity. However, if we take into 
account its nonzero value, the sensitivity variation will 
have to be evaluated according to the first line of eq.(15). 
Both D^j.j. and (e^) have been calculated as sums of in- 
tegrations between consecutive (5„'s. Nevertheless, these 
sums are not infinite, as n ranges from n = —n{X) to 
n = n(A). The integration over each individual interval 
gives differences in incomplete gamma functions which 
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have been numerically evaluated. Not surprisingly, the re- 
sult of applying all these corrections is very small indeed 
(at A ~ 1, they are of the order of 10"^ - 10"''). The cor- 
rected curves have been drawn in Fig. 1 as dashed lines, 
but they just overlap the existing lines and can be hardly 
distinguished. 

Another possibility is to perform a nonlinear quan- 
tization or discretization. Several tests have been made 
using a sinh(aa;) response function, and changing the val- 
ues of the nonlinearity parameter a (when a — > 0, the 
linear case is recovered). In general, the compression rate 
increases, but the distortion becomes higher as well. For 
instance, when a = 2.5, and the values of the discretiza- 
tion parameter are comparable to A ^ 1, c,. ^ 11 and 
Deir/(Ts ~ 0.6 (with linear discretization we had ~ 7.3 
and Derr/c* ^ 0.3). Taking a = 5.0, we find ~ 13 and 
Derr/o's ~ 1.6. If wc pick nonlinear and linear cases giving 
the same c^, the distortion associated to the linear one 
is, in general, smaller. Another disadvantage of nonlinear 
quantization is that the mean of the discretized variable 
to be stored may be too sensitive to the minimum and 
maximum values of r], which can keep changing at every 
new set. 

5. CONCLUSION 

We have considered several possible ways of reducing the 
size of the data on board the Planck satellite: 

— (a) Averaging. One could average the information in 
adjacent pixels within a circle or between consecutive 
images of the same pixel. 

— (b) Changing the digital resolution, A. 

— (c) Doing lossless compression. 

Because of the existence of possible instrument insta- 
bilities and l//-noise doing (a) alone, i.e., just averaging, 
could result in a dangerous decrease of the overall mission 
sensitivity. This is illustrated in Eq.[8] but will be better 
quantified in future studies. Instead of this, one might try 
to use a low digital resolution, which should be balanced 
in order to maintain an acceptably low digital distortion. 
A large digital distortion could bring about some loss of 
sensitivity, but this is more controlable than losses due 
to instrument instabilities or lack of diagnostic informa- 
tion. The amount of possible lossless compression in (c) 
depends, in fact, on the digital resolution and on the sta- 
tistical nature of the signal (eg its Shannon entropy). We 
have proposed to code the data in terms of differences 
between consecutive circles at a given sky pointing. This 
technique allows for lossless compression and introduces 
the flexibility to combine the above methods in a reli- 
able way, making precise predictions of how data can be 
compressed and of how it could change the final mission 
sensitivity due to digital distortion. 

We have given some quantitative estimates of how the 
above factors can be used to address the problem of ob- 



taining the large compression rates required for Planck. 
For instance, one may observe the table below. 



A 


Cr 




0.6 


5.6 


0.01 


0.8 


6.5 


0.03 


1.0 


7.3 


0.04 


1.2 


8.0 


0.06 


1.4 


8.9 


0.07 


1.6 


9.6 


0.10 



taken from the simulation results shown in Fig. 1. We 

have listed (Huffman) compression factors and sensitivity 
variations for given values of the relative digital resolution 
parameter A = A/ct. At A = 1, a compression rate of 7.3 
has been found, at the price of increasing the theoretical 
(continuous) sensitivity by 4% due to the low digital res- 
olution. When A = 1.6, the compression reaches 9.6 and 
the sensitivity changes by just a 10%. 

If we want to approach a realistic modeling of the final 
CMB map sensitivity we will need to know in detail which 
part of the diagnostic on-board information should be 
downloaded to Earth. More work is needed to find an op- 
timal solution among the different strategies listed above. 
The optimization will depend upon other critical points of 
the mission that still need to be specified in more detail, 
such as: the survey and pointing strategy, the instrumen- 
tal performance, the final temperature (or electric) data 
ranges, the analogue-to-digital converters or the on board 
calibration. We have argued that our proposal of coding 
and compressing the data in terms of differences of consec- 
utive circles at a given sky pointing, has many advantages 
and is a first step towards this optimization. 

Acknowledgments 

We would like to thank P. Fosalba, J.M. Herreros, R. Hoy- 
land, R.Rebolo, R.A.Watson, S. Levin and A. de Oliveira- 
Costa for discussions. This work was in part supported 
by Comissionat per a Universitats i Recerca, Generalitat 
de Catalunya, grants ACES97-22/3 and ACES98-2/1, and 
1998BEAI400208 by the Spanish "Plan National del Espa- 
cio", CICYT, grant ESP96-2798-E and by DGES (MEC, 
Spain), project PB96-0925. 

References 

Bersanelli, M. et al, COBRAS/SAMBA, Phase A Study for an 

ESA MS Mission, ESA report D/SCI(96)3. 
Blum, E.J. Annalcs d'Astrophysiquo, 22-2, 140, 1959 
Gaztaiiaga, E., Barriga, J., Romeo, A., Fosalba, P., Elizalde, 

E. 1998, Data compression on board the PLANCK Satellite 

Low Frequency Instrument: optimal compression rate, Ap. 

Let. Com. in press, astro-ph/9810205. 
Herreros, J.M., Hoyland, R., Rebolo, R., Watson, R.A., 1997 

Ref.: LFI-IAC-TNT-001 
Mandolesi, N. et al. , 1998, LFI for Planck, a proposal to ESA's 

AO. 



8 



Data Processing and Compression of CMB Anisotropies on Board the PLANCK Satellite 



Maris, M. Planck LFI Consortium Meeting, Florence, 1999 
March 25-26 

Romeo, A., Gaztanaga, E., Barriga, J., Elizaldc, E. 1998, In- 
formation content in Gaussian noise: optimal compression 
rates, International Jounal of Modern Physics C, in press; 
physics/9809004. 

Gersho A. and Gray, R.M., Vector Quantization and Signal 
Compression, Kluwer Acad. Press, 1992. 

A. Appendix: RMS noise in a Gaussian difference 



hnee = 0.06(Hz) 


fknee = 0.005(Hz) 


fmin (Hz) 


P 


fmirt (Hz) 


p 


1/7200 


9.8 10-* 


1/7200 


8.1 10"" 


3.17 10"" 


4.4 10"^ 


3.17 10"* 


3.7 10"* 



where 

/■°° cosi , 
ci{x) = — at 

Jx ^ 

We want to know if correlation duo to 1// noise is impor- 
tant in our data handling model, so let us calculate some 
specific examples. In our model fmax is given by the in- 
verse of the sampling rate and fmin is the inverse of two 
hours (if calibration occurs at every pointing) or the in- 
verse of the mission's time (about 1 year). In the Table 
2 we have computed the magnitude of p for our model 
and for two different values of the calibration time, i.e. 
^1 fmin- We can see in the Table how small the values of p 
are compared to unity. Given the precision needed for the 
entropy and compression factors, such contributions of p 
to are neglegible. 



Table 2. Values of the autocorrelation p between consecutive 
sky pixels for different total calibration times, 1/ fmin- 



We can model the process of differencing as the sub- 
traction of two gaussian random variables: rji and 772 with 
variances a\ and cri. The probability density distribution 
for the difference random variable 5 = 772 — t?i is also a 
gaussian distribution with a new variance tr^: 

O'l =0-1+0-1- 2pfTi(T2. 

For a wide sense stationary process ai = 02 = cr and 
a1 = 2cr^ (1 — p). One can obtain also in this way the 
expression for the entropy of the distribution, 

h K. log2 {y/2'Ke ct^/a) . 

We want to take differences of data separated by r = 
1 minute, which corresponds to the same sky position. 
Bearing in mind that our model is a first order Markov 
process p will be equal to the correlation between pixels 
separated 1 min., that is pa"^ = C(r = 1 min.) (recall 
that the correlation matrix for a wide sense stationary 
stochastic process is a symmetric Toeplitz matrix and so 
it depends only on index differences). Thus the two-point 
correlation is: 

/ + CO 
-00 

Next, we are going to estimate this correlation for a power 
spectrum P{f) of the type of white noise plus 1/f (i.e. 
P{f) in Eq.[7]). In practice our spectrum will not run 
over the whole range but only over a limited interval 
{fmin, fmax)- The final result, for r ^ 0, is: 

This article was processed by the author using Springer- Verlag 
A&A style file L-AA version 3. 



C(r) = 2 A 



sin(27r/r) 
2^rT 



fknee ci(27r/r) 



